Two-locus identity coefficients in pedigrees

Abstract This paper proposes a solution to a long-standing problem concerning the joint distribution of allelic identity by descent between two individuals at two linked loci. Such distributions have important applications across various fields of genetics, and detailed formulas for selected relationships appear scattered throughout the literature. However, these results were obtained essentially by brute force, with no efficient method available for general pedigrees. The recursive algorithm described in this paper, and its implementation in R, allow efficient calculation of two-locus identity coefficients in any pedigree. As a result, many existing procedures and techniques may, for the first time, be applied to complex and inbred relationships. Two such applications are discussed, concerning the expected likelihood ratio in forensic kinship testing, and variances in realized relatedness.


Introduction
The study of genetic relatedness centers around various coefficients of relatedness, each defined as the probability that certain alleles are identical by descent (IBD), i.e. that they originate from the same ancestral allele within the given pedigree. For the alleles of two individuals at a single locus, common coefficients range in complexity from the simple kinship and inbreeding coefficients (Wright 1922) to the detailed identity coefficients which characterize the distribution of IBD states for any pairwise relationship (Jacquard 1974).
The generalization of IBD probabilities to multiple linked loci was pioneered by Haldane (1949), who defined two-locus kinship and inbreeding coefficients and derived explicit formulas in special cases. Seeking a general procedure, he admitted defeat in the case of individuals with related common ancestors. (In fact, the problems Haldane faced here are unsolvable in his formulation, as we demonstrate in Section "Examples.") Weir and Cockerham (1969) outlined a more general method, but their approach is impractical except in small pedigrees. Finally, Thompson (1988) proposed an efficient, recursive algorithm for two-locus kinship coefficients, elucidated and popularized by Weeks and Lange (1992). This algorithm is implemented in the MORGAN software (https://sites.stat.washington.edu/thompson/ Genepi/MORGAN/Morgan.shtml) and also included in the R package ribd featured in the present paper. The latter has the advantage of supporting selfing and pedigrees with inbred founders (Vigeland 2020).
Just as in the single-locus case, the two-locus kinship coefficient is only the simplest in a range of two-locus coefficients. For noninbred pairs, the 9 two-locus IBD coefficients κ ij (ρ), for i, j ∈ {0, 1, 2}, are defined as the probabilities of sharing exactly i alleles IBD at the first locus and j at the second, where ρ denotes the recombination fraction. These coefficients, and their extension to inbred relationships, are the focus of the present article.
Two-locus IBD coefficients have played central roles in a variety of applications over the years. For example, they were important in the development of medical linkage analysis (Bishop and Williamson 1990) and quantitative-trait linkage analysis (Almasy and Blangero 1998). Applications in forensic genetics include match probabilities (Buckleton and Triggs 2006;Bright et al. 2013), kinship testing (Egeland and Sheehan 2008;Egeland and Slooten 2016) and mixture analysis (Dørum et al. 2015) among others. Furthermore, two-locus coefficients encode key information about distributions of realized (or actual, or genomic) relatedness, which is sometimes more relevant than the pedigree-based expectation (Speed and Balding 2015). A considerable body of literature is devoted to estimating variances in realized IBD sharing by integrating suitable two-locus IBD coefficients (Guo 1995(Guo , 1996Weir 2011, 2012;Thompson 2013).
Despite the enduring interest in two-locus IBD coefficients, no efficient algorithm for their calculation has hitherto been described in the literature. Formulas for κ ij (ρ) in special cases were given by Denniston (1975) using a path-counting approach, and subsequent authors have used similar, direct methods to analyze certain classes relationships. Nevertheless, previous works involving two-locus IBD coefficients (or closely related probabilities) have been invariably limited to simple, noninbred relationships.
In this paper, we propose and implement a method for computing pairwise joint two-locus IBD probabilities in any pedigree. This includes the 9 coefficients κ ij (ρ) for noninbred pairs, and also the 81 two-locus condensed identity coefficients for general pairs. The key ingredient is a recursive algorithm for generalized two-locus kinship coefficients, inspired by previous works on single-locus identity coefficients (Weeks and Lange 1988;Lange and Sinsheimer 1992). Using this method, and its implementation in the R package ribd (Vigeland 2020), several existing applications may be extended to general pedigrees, including inbred relationships. Two such applications are explored in the Discussion; one in forensic kinship testing and one in the study of variances in realized relatedness.

Background
In this section, we review the basics of relatedness coefficients and settle our notation.
A pairwise relationship is a triple (a, b, P), where P is a finite, connected pedigree, and a and b are (not necessarily distinct) members of P. We will usually assume that the relationship is nontrivial, i.e. that a and b are connected with at least one path in P, although most of the results also hold in the trivial case. Homologous alleles of a and b are IBD if they descend from the same allele carried by a common ancestor of a and b within P. We often suppress P in our notation, but emphasize that any calculations or concepts based on IBD only make sense in the context of an explicit pedigree. We restrict our attention to diploid species.

Single-locus coefficients
The kinship coefficient φ ab between a and b is the probability that a random allele from a is IBD with a random allele from b at the same autosomal locus. The inbreeding coefficient f c of a child c between a and b is defined by f c = φ ab . We say that c is inbred if f c > 0, and completely inbred if f c = 1. Pedigree founders are usually assumed to be noninbred, but this may be relaxed in some applications (Vigeland 2020).
Note about notation: Relatedness coefficients are conventionally written with the individuals as subscripts rather than superscripts. We use superscripts in this paper, reserving subscripts for other indices. When the context is clear, we may drop the superscripts and simply write φ or f . Figure 1a shows all possible patterns of IBD between the four alleles carried by two individuals at a single autosomal locus. The 15 patterns S * 1 , . . . , S * 15 are called the (single-locus) detailed identity states. The expected relative frequencies of these states in a given relationship (a, b, P) are denoted δ 1 , . . . , δ 15 , and referred to as the detailed identity coefficients of a and b.

Noninbred relationships
We say that a relationship R = (a, b, P) is noninbred if neither a nor b is inbred in P. Note, however, that other members of P may be inbred. For noninbred relationships, only Δ 9 , Δ 8 , Δ 7 may be nonzero-the probabilities that a and b share, respectively, 0, 1, and 2 alleles IBD. Following Thompson (1975), we denote these by κ 0 , κ 1 , κ 2 . We refer to them as IBD coefficients, to distinguish them from the previously defined identity coefficients.
A noninbred relationship R is called unilineal if κ 2 = 0, and bilineal if κ 2 > 0 (Cotterman 1940). Furthermore, R is direct if a is a direct descendant of b or vice versa, and collateral otherwise. The following simple fact will be used in later proofs: Proof. For the first implication, suppose for a contradiction that R is bilineal and that a is an ancestor of b. If f , m are the parents of b, we can then assume w.l.o.g. that a is an ancestor of (or equal to) f . In order for κ 2 to be nonzero, a must be related to m as well. But then m and f are related, contradicting the assumption that b is noninbred. The same argument applies if b is an ancestor of a, hence we conclude that R cannot be direct.
The second implication follows from the fact that if a is a founder of P, the only relatives of a in P are the descendants of a. □ Finally, we recall the close connection between φ and κ at a single locus. The following well-known facts, which we include for easy reference, are straightforward from the definitions (see also Thompson 2000). b) If R is unilineal, then κ is determined by φ, by the formula κ = (1 − 4φ, 4φ, 0). c) If R is bilineal, then κ is determined by φ together with the parental kinships. If f , g are the parents of a and m, n the parents of b (cf. Lemma 1), then: (1)

Two-locus coefficients
All of the single-locus coefficients described above can be generalized to multiple loci by considering the joint IBD probabilities at the loci. Crucially, such multilocus coefficients are functions of the recombination rates between the loci. In the present paper, we restrict our attention to two autosomal loci, L 1 and L 2 , with recombination rate ρ ∈ [0, 0.5], which we assume to be the same for males and females.
The two-locus kinship coefficient φ ab 11 (ρ) is the probability that a random gamete from a and a random gamete from b contain IBD alleles at both L 1 and L 2 . If a and b are clear from the context, we will write this coefficient as φ 11 (ρ), or simply φ 11 , with the understanding that it is always a function of ρ. As first observed by Haldane (1949), we have φ 11 (0) = φ and φ 11 (0.5) = φ 2 for any relationship.
If both a and b are noninbred, we define the two-locus IBD coefficient κ ab ij (ρ), for i, j ∈ {0, 1, 2}, to be the probability that a and b share exactly i IBD alleles at L 1 and exactly j IBD alleles at L 2 . As with φ 11 , we may drop a, b and ρ from the notation and simply write κ ij . Clearly, κ ij = κ ji so the coefficients form a symmetric matrix: If ρ = 0, the absence of recombination implies that any pedigree path between a and b yields IBD alleles either at both or none of the loci. Hence κ ii (0) = κ i for i = 0, 1, 2 and κ ij (0) = 0 if i ≠ j. At the other extreme, ρ = 0.5, the two loci segregate independently of each other, thus κ ij (0.5) = κ i κ j for all i, j ∈ {0, 1, 2}.
Another important observation is that for any ρ, the row and column sums of K(ρ) are κ: Indeed, this is a simple consequence of the law of total probability applied to the first locus (row sums) and the second locus (column sums). The relations (2) are especially powerful for unilineal relationships, where κ 2 = 0. Then also κ i2 = κ 2i = 0 for i = 0, 1, 2, and we obtain after simplification, Since κ 1 = κ 11 (0), it follows that in the unilineal case, the entire matrix K is uniquely determined by κ 11 . Finally, we generalize to the case where a and b may be inbred.
as the probability that the condensed identity states at L 1 and L 2 are S i and S j , respectively. For any fixed ρ these coefficients form a symmetric 9 × 9 matrix: As in the noninbred case, the row sums, and also the column sums, of D are the single-locus coefficients Δ 1 , . . . , Δ 9 .

Examples
The purpose of this section is to illustrate some of the complications that arise with linked loci. In light of Proposition 1, it is natural to wonder if there exists a direct relationship between the two-locus coefficients φ 11 and K. Especially for unilineal relationships, where the one-locus situation boils down to the formula κ 1 = 4φ, one might hope for a similar identity relating κ 11 and φ 11 . Unfortunately, as the following two examples demonstrate, no such formula can exist.
The first well-known example (e.g. Section 4.5 of Thompson 2000) shows that two relationships may have different κ 11 , but the same φ 11 . An interpretation of this is that the relationships are theoretically distinguishable given genetic data from the two individuals, but not with data from their child alone.
Example 1 (Different κ 11 , same φ 11 ). The relationships of grandparentgrandchild (G) and half-siblings (H) have different two-locus IBD functions, These functions are easily verifiable by direct calculation.
Next, we show that the opposite situation is also possible. This particular example appears to be original, although it seems likely that the effect it illustrates has been known to previous authors.
Example 2 (Same κ 11 , different φ 11 ). Consider an outbred parent-offspring relationship (PO), and compare it with half-sibs whose shared parent is completely inbred (H-i). It is easy to see that both of these satisfy κ 11 (ρ) = 1 for all ρ; in other words, they have a constant IBD matrix In the H-i case, the IBD alleles are always in cis, i.e. on the same haplotype, since they come from the same parent. For PO, on the other hand, the alleles are in cis in the child, but not necessarily in the parent. This difference leads to distinct φ 11 functions (see Example 4 for calculations): A remarkable consequence of Example 2 is that PO and H-i cannot be distinguished by means of (unphased) genetic data from the two individuals themselves, but can be so given data from their child alone.
For our final example, we return to Haldane's problem mentioned in the introduction. In our notation this amounts to the following: Given a relationship (a, b, P) where a and b have two different common ancestors P, Q in P, find a formula for φ ab 11 expressed by the single-locus coefficient φ PQ and the two-locus coefficient φ PQ 11 between the ancestors. Failing to do so, Haldane remarked: It is possible that [these coefficients] do not give all the needful information.
As it turns out, Haldane's intuition was correct: In general, φ ab 11 depends not only on φ PQ and φ PQ 11 , but also on the two-locus IBD matrix K PQ . Here is an illustration: Example 3. Suppose a and b are full siblings whose parents P and Q are unilineally related. Then the two-locus kinship φ 11 of a and b is given by In particular, φ 11 cannot be expressed solely by φ PQ and φ PQ 11 .
The formula (8) is obtained as follows. The first two terms cover the cases where the emitted gametes are nonrecombinant, either originating from the same parent (probability (ρ 2 + ̅ ρ 2 )( 1 2 ̅ ρ) 2 ) or one from each (2φ PQ 11 ( 1 2 ̅ ρ) 2 ). The middle term ̅ ρρφ PQ is the probability of IBD at both loci when one gamete is recombinant and the other not. Finally, when both gametes are recombinant, the alleles may come from the same parent at each locus, i.e. paternal alleles at L 1 and maternal at L 2 , or vice versa (total probability 1 8 ρ 2 ), or one from each parent at each locus ( ρ 2 32 κ PQ 11 ). Altogether, this gives the claimed formula.
For an explicit example, consider the case of siblings whose parents are half-siblings. Inserting the expressions for φ H 11 and κ H 11 from Example 1 into (8), we obtain after simplification: Denniston (1975) introduced a refinement of the κ ij coefficients, taking into account the phase of the IBD alleles. He found efficient formulas for some of these extended coefficients, but had to resort to tedious path tracing for the remaining ones. One contribution of the current paper is to enable recursive calculation of all of these phased coefficients, which in turn provide the κ ij 's. Starting with the coefficient κ 11 , this naturally splits into four phased components:

Phased components of two-locus coefficients
The superscripts signify if the IBD alleles are in cis or in trans in a and b, respectively. The underlying IBD patterns are shown in the top row of Fig. 2. Turning to bilineal relationships, the coefficients κ 21 , κ 12 , and κ 22 have similar decompositions: where the superscripts indicate whether the IBD alleles form the same haplotype(s) in a and b, or if a recombination has happened. See Fig. 2 (rows 2 and 3) for illustrations. Our notation differs slightly from that of Denniston (1975). For each pattern in Fig. 2, it is straightforward to find the probability that a and b emit gametes with IBD alleles at both loci. For instance, in the diagram corresponding to κ cc 11 (top-left), this amounts to ( 1 2 ̅ ρ) 2 , since both individuals must emit the same haplotype unrecombined. With similar calculations for the other patterns in the top row, we finally obtain a two-locus analogue of the single-locus formula φ = 1 4 κ 1 (Proposition 1b) for unilineal relationships: For bilineal relationships, we must include all the patterns in Fig. 2, producing the formula where R = ρ 2 + ̅ ρ 2 . Exploiting the symmetries κ h 21 = κ h 12 and κ r 21 = κ r 12 , the expression can be compactified to  (10) produces the formulas for φ 11 given in (7).

Generalized kinship coefficients
A key idea introduced by Karigl (1981), was to express identity coefficients in terms of generalized kinship coefficients. Recursive formulas for such coefficients were given by Karigl and further developed by other authors, allowing efficient computation of identity coefficients.
Passing to the two-locus situation, it is natural to seek a similar generalization of two-locus kinship coefficients. Thompson (1988) used special cases of this to compute φ 11 , but to the best of our knowledge no general treatment has been given. Indeed, this will prove to be the main ingredient in computing the two-locus coefficients K and D.
We begin by reviewing the single-locus case.

Generalized single-locus kinship coefficients
The pairwise kinship coefficient φ generalizes naturally to three or more individuals. For example, Karigl (1981) used the notation φ abc to denote the probability that homologous alleles drawn from individuals a, b, and c are all IBD. We will adopt a more flexible notation, close to that of Weeks and Lange (1988), writing the above three-person coefficient as Φ ([a, b, c]). A crucial idea of Weeks and Lange (1988) was to consider multiple groups of IBD alleles simultaneously. For example, the coefficient Φ ([a, b], [c, d]) denotes the probability that if one allele is sampled at random from each of a, b, c, and d, then the alleles from a and b are IBD, and the ones from c and d are IBD, but different from those in the first group. More generally, a generalized kinship pattern is a finite collection of blocks of pedigree members, The associated generalized kinship coefficient Φ(G) is the probability that if one allele is sampled from each individual (with replacement if the individual is repeated), then the alleles within each block are all IBD, while alleles from different blocks are not IBD. A few simple properties of generalized kinship coefficients are worth noticing. Firstly, Φ(G) is invariant under permutations of the blocks, and also under permutations of the individuals within a block. If any block of G contains two different founders, or indeed any unrelated individuals, then Φ(G) = 0. Moreover, if any individual occurs in more than two blocks, this also implies Φ(G) = 0.

Generalized two-locus kinship coefficients
Let L 1 and L 2 be fixed autosomal loci with recombination rate ρ. If G 1 and G 2 are generalized single-locus kinship patterns, we write G 1 n G 2 for the simultaneous occurrence of G 1 and G 2 at loci L 1 and L 2 , respectively. The corresponding generalized two-locus kinship coefficient is the probability Φ(G 1 n G 2 ) : = P(G 1 at L 1 and G 2 at L 2 ).
As with all the previous two-locus coefficients, this probability is a function of ρ. Whenever a kinship pattern G 1 n G 2 involves multiple alleles from the same individual, we must specify whether or not these belong to the same gamete. To this end, we add a segregation index superscript to each allele (Thompson 1988;Lange and Sinsheimer 1992). Note that the indices themselves are irrelevant; only equality between indices matters, and only if attached to the same individual. For example, the previously defined two-locus kinship φ ab 11 can be expressed as Φ([a 1 , b 1 ] n [a 1 , b 1 ]), but also as, e.g.
Even though the segregation index carries no inherent meaning, the following convention is useful. If the involved gamete is a transmission from a parent x to a child y, we may use y as a segregation index and write x y . This notation is particularly efficient in implementations and is used extensively in the recursion formulas in Appendix A. It has, however, one notable shortcoming that seems to have gone unnoticed by previous authors. If y is the result of selfing of x, there are two different gametes segregating from x to y. Since these require different labels, the notation must be augmented in this case, e.g. y1 and y2. We note that the software MORGAN, which implements the algorithm of Thompson (1988), does not support pedigrees with selfing.
A recursive algorithm for computing Φ(G 1 n G 2 ) for any G 1 , G 2 in any pedigree, is given in Appendix A.

Two-locus identity coefficients
In this section, we show how the generalized two-locus kinship coefficients allow us to calculate two-locus identity coefficients. We split the presentation into three steps, in increasing order of pedigree complexity, which have different computational demands. We start with the unilineal case, where a simple trick provides efficient calculation of the matrix K.

Unilineal relationships
Let R = (a, b, P) be a unilineal relationship, and K = K(ρ) its matrix of two-locus IBD coefficients. Recall from equation (3) that K is uniquely determined by κ 11 .
As evident from Example 1, we cannot generally compute κ 11 directly from φ 11 . Moreover, equation (10) shows why: the phased components of κ 11 contribute unequally to φ 11 . However, it turns out that we can recover κ 11 by considering a slightly more complex, generalized kinship coefficient, constructed to balance the contributions.

[a 2 ], [b 2 ]} is the two-locus IBD pattern in
Proof The crucial point is that H * has the same probability under each of the four cis/trans combinations underlying κ 11 , shown in the top row of Fig. 2. To see this, note that if the IBD alleles in a are in cis, the two gametes from a dictated by H * (cf. Fig. 3) occur with probability 1 2 ̅ ρ and 1 2 ρ, respectively. On the other hand, if the alleles are in trans, these probabilities are simply switched. Hence the total probability of a's gametes is always 1 4 ρ̅ ρ. Clearly, the same holds for b, and it follows that Φ(H * ) = ( 1 4 ̅ ρρ) 2 (κ cc 11 + κ ct 11 + κ tc 11 + κ tt 11 ) = 1 16 ̅ ρ 2 ρ 2 κ 11 as claimed. □ Theorem 1 enables efficient calculation of κ 11 (ρ), and thereby the entire matrix K(ρ), for any ρ > 0. The endpoint ρ = 0 is trivial, as previously explained.
The four phased coefficients κ cc 11 , . . . , κ tt 11 can also be computed using a similar technique as that in Theorem 1. The idea is to find four different generalized IBD patterns H 1 , . . . , H 4 whose coefficients are linear expressions in κ cc 11 , . . . , κ tt 11 . By choosing H 1 , . . . , H 4 such that the resulting system of linear equations has full rank, this can then be solved for the phased coefficients. Details are given in Appendix B.

Bilineal relationships
Moving on to bilineal relationships, we now assume (by Lemma 1) that a and b are nonfounders of P. Let f , m denote the parents of a, and g, n the parents of b, with the understanding that some of these may coincide. To simplify matters we assume that a ≠ b, noting that if a = b, then K is trivially determined by κ 22 (ρ) = 1 for all ρ.
Before continuing, it is worth noting why Theorem 1 no longer holds in the bilineal case. When one or both loci may share 2 alleles IBD, the expression for Φ(H * ) given in the proof of that theorem, gains several additional terms which obstruct an explicit solution for κ 11 . Our approach for computing K, and also the 9 × 9 two-locus identity matrix D in the next section, is inspired by the method used by Lange and Sinsheimer (1992) to calculate the single-locus coefficients δ 1 , . . . , δ 15 . They noted that, for example, the detailed identity state S * 10 corresponds to (in our notation) the generalized kinship pattern , which by means of a recursive algorithm enabled Lange and Sinsheimer to compute δ ab 10 in any pedigree. In the same fashion one may define generalized IBD patterns J 9 , . . . , J 15 corresponding to all detailed states S * 9 , . . . , S * 15 in which a and b are noninbred (cf. Fig. 1a): (16) (We will complete this list in the next section by including patterns corresponding to S * 1 , . . . , S * 8 .) Let B 0 , B 1 B 2 , be the partition of indices {9, . . . , 15} according to the number of IBD alleles in each of the detailed states: With these definitions the single-locus IBD coefficients could be obtained by the formula Switching to the two-locus situation, our aim is to give a two-locus version of (18). To compute a two-locus IBD coefficient, say κ 00 , we might proceed as follows: κ 00 is the probability that a and b share 0 alleles IBD at both loci, i.e. that both loci are in state S * 15 . In other words, κ 00 = Φ(J 15 n J 15 ). Similarly, κ 20 is the probability that the state at L 1 is either S * 9 or S * 12 (the states where a and b share 2 alleles), while L 2 is in state S * 15 , thus κ 20 = Φ(J 9 n J 15 ) + Φ(J 12 n J 15 ). By this reasoning, we obtain the following general result: Theorem 2. Suppose a and b are nonfounders in P, and let J 9 , . . . , J 15 be the IBD patterns defined in (16). The two-locus IBD coefficients κ ij , for i, j = 0, 1, 2 are then given by.
A complete overview of the detailed two-locus states and their corresponding patterns J r n J s , is given in Table 1. This table also shows how to obtain the phased versions by further partitioning the sums given in Theorem 2. For example, equation (19) dictates that κ 22 = Φ(J 9 n J 9 ) + Φ(J 9 n J 12 ) + Φ(J 12 n J 9 ) + Φ(J 12 n J 12 ). (20) Fig. 3. The two-locus IBD pattern H * = {[a 1 , a 2 , b 1 , b 2 ]} n {[a 1 , b 1 ], [a 2 ], [b 2 ]} used to compute κ 11 for unilineal relationships. Two gametes are emitted from each of a and b. At the first locus, all four alleles are IBD. At the second locus, exactly one gamete of a is IBD with exactly one gamete of b.
Inspecting the corresponding states (top two rows of Table 1) it is evident that the first and fourth terms of (20) contribute to κ h 22 , while the other two belong to κ r 22 .

General relationships
The method of the previous section generalizes immediately to the full-blown matrix D of 81 two-locus identity coefficients between a and b, which we now allow to be inbred. The main challenge is the volume of cases, as there are now 15 · 15 = 225 detailed two-locus states to consider, each corresponding to a generalized two-locus kinship coefficient of the form Φ(J r n J s ), r, s = 1, . . . , 15.
First of all, we complete the list (16) by adding the patterns corresponding to the states S * 1 , . . . , S * 8 for inbred a and/or b: In the same manner as (17), we define subsets C 1 , . . . , C 9 ⊆ {1, . . . , 15} as the indices corresponding to the condensed states, i.e. so that Δ i = r∈C i δ r .
We can then give the most general result of this paper, providing an implementation-friendly formula for the 81 two-locus identity coefficients of any pairwise relationship (a, b, P).
The assumption that a, b are nonfounders can easily be circumvented by extending the pedigree before applying the theorem. For example, if a is a founder of P, let P ′ be the pedigree resulting from adding both of a's parents, as unrelated founders. Theorem 3 can then be applied to (a, b, P ′ ).

Implementation
The algorithms described in this paper are implemented in the R package ribd (Vigeland 2020), which is part of the ped suite collection of packages for pedigree analysis in R (Vigeland 2021). Detailed explanations and many examples are included in the documentation of the functions twoLocusKinship, twoLocusIBD and twoLocusIdentity.
In light of the extensive recursions needed to calculate generalized two-locus kinship coefficients, care should be taken to alleviate the computational burden. A naive application of Theorem 3 requires the calculation of 15 2 = 225 generalized two-locus kinship coefficients Φ(J r n J s ) to obtain the complete matrix D. However, this number can be almost halved by exploiting linear dependencies. Let n ij = |C i | · |C j | be the number of terms in equation (23), i.e.
A particular feature of the ped suite is the support of founder inbreeding, i.e. the assignment of nonzero inbreeding coefficients to any pedigree founders. As shown in Vigeland (2020, Section 6.2), such founder inbreeding generally leads to ill-defined multilocus IBD coefficients, except in cases of complete inbreeding. All twolocus functions in ribd support completely inbred founders (and give an error if encountering partially inbred founders). For example, the H-i pedigree featured in Example 2 can be analyzed as follows, after loading the ribd package in R: # Half siblings with completely inbred parent x = halfSibPed() founderInbreeding(x, ids = 2) = 1 # A two-locus kinship coefficient twoLocusKinship(x, ids = 4:5, rho = 0.25) [1] 0.140625 Table 1. Formulas for the phased two-locus IBD coefficients in bilineal relationships.
# The two-locus IBD matrix (same for any rho) twoLocusIBD(x, ids = 4:5, rho = 0.25) The output of the last command (not shown) is the matrix K in equation (6).
If either of the two individuals, say a, is a founder, then the algorithm described in Section "General relationships" requires that we extend the pedigree by adding the parents of a before applying Theorem 3. However, this cannot be done adequately if a is completely inbred; hence in this particular case founder inbreeding is not supported in the current implementation.
Of particular interest is the work of Almasy and Blangero (1998) (AB98 in the following), which provides extensive tables of formulas for κ 11 in unilineal relationships, and IBD correlation formulas (which are simple functions of the κ ij 's) for many bilineal cases. Some mistakes in these formulas were discovered as a result of the comparison. Given the high impact of AB98 we briefly record these here: In Table 4 of AB98, the correlation coefficient of "Double second cousins (type A)" should be 1 − 48 7 θ + 134 7 θ 2 − 200 7 θ 3 + 172 7 θ 4 − 80 7 θ 5 + 16 7 θ 6 (they use θ to denote the recombination fraction). Furthermore, Table 6 has a misprint in the entry for "First cousin and second cousin," where the coefficient of θ 6 should be 8,240, not 8,420.

Discussion
Considering the enduring interest in two-locus IBD probabilities, the lack of general algorithms may seem surprising. One explanation may be that for simple relationships explicit formulas can be obtained by direct calculation. Below we briefly discuss two applications of two-locus coefficients which, with the methods of the present paper, can now be extended to larger classes of pedigrees.
In forensic kinship testing, a hypothesized relationship R between two individuals is typically tested by evaluating the likelihood ratio LR = P(G | R)/P(G | unrelated), where G denotes the genotypes at a predetermined set of markers. Egeland and Slooten discovered that if G is interpreted as a random variable, the expectation E[LR] is independent of allele frequencies, and can be expressed by a remarkably simple formula (Egeland and Slooten 2016). In the case of two markers, let M (1) be the matrix where n 1 is the number of alleles at the first marker, and define M (2) similarly for the second marker. For convenience, we use 0-indexing for the entries of these matrices, e.g.

M
(1) 11 = (n 1 + 3)/4. Now, suppose (κ ij (ρ)) are the two-locus IBD coefficients of R, where ρ is the recombination fraction between the markers. Furthermore, suppose that (κ ′ ij (ρ)) are the coefficients of the true relationship R ′ , which may be different from R. (We assume that both R and R ′ are noninbred.) The expected likelihood ratio is then the following sum (Egeland and Slooten 2016, eq. 2.18), In Fig. 4, we reproduce and expand a result of Egeland and Slooten (2016, Example 3.4). Here, E[LR] is shown as a function of ρ for various relationships (assuming R ′ = R) with two markers of 10 and 15 equally frequent alleles. The three lowest curves, corresponding to grandparent-grandchild, halfsiblings and uncle-nephew relationships, agree with the lower part of Fig. 2 in Egeland and Slooten (2016). Two further relationships have been added, namely quadruple half-first cousins (QHFC) and simultaneous half-siblings and half-second cousins (HS+HSC). These were chosen because they are genetically close to the first three, and also to illustrate the utility of the algorithm presented in this paper, since exact formulas for κ ij (ρ) are not available (and nontrivial to work out) for these relationships. An interesting observation from Fig. 4 is that the ranking of the relationships according to E[LR] depends on the distance between the markers.
Another powerful application of two-locus coefficients is in the study of realized relatedness, i.e. the actual IBD segments shared by a pair of related individuals. Guo (1995Guo ( , 1996 showed that for noninbred individuals, the variance in proportion of genome-shared IBD can be expressed as a double integral involving two-locus IBD probabilities, and used this to compute the said variance in special cases. The same technique can be used to compute other similar variances. For instance, for a given pair of noninbred relatives, let (k 0 , k 1 , k 2 ) be the actual proportions of their genomes where they share 0,1,2 alleles IBD, respectively. For each j = 0, 1, 2 we have E(k j ) = κ j . For a chromosome of length L, we can write an indicator variable and K x is the number of IBD alleles (0, 1 or 2) at locus x. The variance formula Var(Y) = E(Y 2 ) − E(Y) 2 then gives: Here, ρ xy is the recombination rate between loci x and y.
Assuming a Poisson crossover process, Haldane's map func- y) is the genetic distance (in Morgan) between x and y. Hill and Weir (2011) used an approach resembling (27) to compute the variances of k j and other measures of realized relatedness. However, only special cases of noninbred relationships were considered, in which the required two-locus IBD probabilities could be obtained by direct, but tedious calculations. In contrast, the algorithm and implementation presented here allow these variances to be also obtained in complex pedigrees. This includes inbred relationships, where variances in the realized proportions in states S 1 , . . . , S 9 may be defined analogously to (27). Further details and examples, including numerical validations of Hill and Weir (2011) are given in the documentation of the ribd package.

Conclusion
This paper presents an algorithm for computing joint IBD probabilities at two linked loci, called two-locus identity coefficients, in any pedigree. Previous work in this area have focused on simple cases of noninbred relationships, where explicit formulas can be obtained by brute force. In contrast, the method described here applies to any pairwise relationship, both noninbred and inbred. The inbred case requires as many as 81 two-locus coefficients, which may seem like a daunting task. However, they can all be expressed in terms of generalized two-locus kinship coefficients, for which a recursive algorithm is given. All methods, including numerous examples, are implemented in the R package ribd, which runs on all common platforms and is freely available from CRAN. As a result, a variety of methods and applications, previously restricted to simple, special cases, may now be applied and explored in general pedigrees.
The first two terms of (A6) cover the possibilities where the alleles a T1 ∪ a T3 emitted from a, all originate from the father f , or all from the mother m. The next two terms consider the situations when all of a T1 come from f and all of a T3 from m, or vice versa. The C 3 terms account for the cases where the set a T1 includes alleles from both f and m (implying that a is inbred), while the alleles in a T3 originate either all from f or all from m. The D 3 terms are similar, but with the loci switched. The last term covers the remaining cases when both sets a T1 and a T3 contain alleles from both f and m.